Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition

نویسندگان

  • Md. Jahangir Alam
  • Patrick Kenny
  • Pierre Dumouchel
  • Douglas D. O'Shaughnessy
چکیده

This work presents a noise spectrum estimator based on the Gaussian mixture model (GMM)-based speech presence probability (SPP) for robust speech recognition. Estimated noise spectrum is then used to compute a subband a posteriori signal-to-noise ratio (SNR). A sigmoid shape weighting rule is formed based on this subband a posteriori SNR to enhance the speech spectrum in the auditory domain, which is used in the Mel-frequency cepstral coefficient (MFCC) framework for robust feature, denoted here as Robust MFCC (RMFCC) extraction. The performance of the GMM-SPP noise spectrum estimator-based RMFCC feature extractor is evaluated in the context of speech recognition on the AURORA-4 continuous speech recognition task. For comparison we incorporate six existing noise estimation methods into this auditory domain spectrum enhancement framework. The ETSI advanced frontend (ETSI-AFE), power normalized cepstral coefficients (PNCC), and robust compressive gammachirp cepstral coefficients (RCGCC) are also considered for comparison purposes. Experimental speech recognition results show that, in terms of word accuracy, RMFCC provides an average relative improvements of 8.1%, 6.9% and 6.6% over RCGCC, ETSI-AFE, and PNCC, respectively. With GMM-SPP -based noise estimation method an average relative improvement of 3.6% is obtained over other six noise estimation methods in terms of word recognition accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Enhancement using Laplacian Mixture Model under Signal Presence Uncertainty

In this paper an estimator for speech enhancement based on Laplacian Mixture Model has been proposed. The proposed method, estimates the complex DFT coefficients of clean speech from noisy speech using the MMSE  estimator, when the clean speech DFT coefficients are supposed mixture of Laplacians and the DFT coefficients of  noise are assumed zero-mean Gaussian distribution. Furthermore, the MMS...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...

متن کامل

Channel identification and spectrum estimation for robust automatic speech recognition

A feature estimation technique is proposed for speech signals that are corrupted by both additive and convolutive noises via combining channel identification with power spectrum estimation. A correlation-matching algorithm is developed for channel identification, and a Gaussian mixture density model of speech DFT spectra is formulated for estimation of speech power spectra. Cepstral features of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014